-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adapter: Create deterministic log index IDs #30603
adapter: Create deterministic log index IDs #30603
Conversation
496dce0
to
9241cb6
Compare
9241cb6
to
80902a7
Compare
a55213c
to
e7a6c1b
Compare
I have a follow-up PR rebased on top of this PR that adds the new introspection sources back: #30620 |
e7a6c1b
to
5f3738b
Compare
This commit adds a new variant of `GlobalId` and `CatalogItemId` for introspection source indexes. The values of these IDs are deterministically derived from the cluster ID and the log variant. Introspection source indexes are a special edge case of items. They are considered system items, but they are the only system item that can be created by the user at any time. All other system items can only be created by the system during the startup of an upgrade. Previously, it was possible to allocate the same System ID to two different objects if something like the following happened: 1. Materialize version `v` is running in read-write mode. 2. Materialize version `v + 1` starts in read-only mode. 3. The next system item ID is `s`. 4. `v + 1` allocates `s` for a new system item (table, view, introspection source, etc.) 5. `v` creates a new user cluster and allocates `s` through `s + n` to the introspection source indexes in that cluster. At this point we have two separate objects with the same Global ID, which is bad. 6. `v + 1` reboots in read-write mode and allocates `s + n + 1` to the new system item. At this point the new system item has received two different IDs, which is also bad. Putting introspection source index IDs in their own namespace and making them deterministic removes this issue and ones like it. Fixes #MaterializeInc/database-issues/issues/8731
5f3738b
to
868e335
Compare
MitigationsCompleting required mitigations increases Resilience Coverage.
Risk Summary:The pull request has a high-risk score of 83, driven by predictors such as the average line count in files and executable lines within files. Historically, PRs with these predictors are 154% more likely to cause a bug than the repository baseline. Additionally, the repository's observed and predicted bug trends are both decreasing. Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity. Bug Hotspots:
|
@@ -19,6 +19,7 @@ message GlobalId { | |||
uint64 user = 2; | |||
uint64 transient = 3; | |||
google.protobuf.Empty explain = 4; | |||
uint64 introspection_source_index = 5; | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooc: Do you know why we have two protobuf definitions for GlobalId
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually have 3, but no I don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have one in the catalog
protobuf objects so types we write for the catalog are isolated from the rest of the system, then another in repr
for the rest of the system. No idea why we have this one in storage-types
though, it can probably be deleted?
src/storage-types/src/instances.rs
Outdated
/// A user storage instance. | ||
User(u64), | ||
User(u32), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is, I think, the scariest change made here. There is some risk that a user creates and drops a cluster in a loop 4 billion times, bricking the system. Given that we only need this for the GlobalId::IntrospectionSourceIndex
generation, and we do manual packing there, and we still have 8 bits left in the representation, could we make this a u40
instead? Or even a u48
, if we make represent the log variant with 8 bits instead of 16.
There is no u48
type of course, so we'd need to continue to using u64
and assert that the high bits are zero. Or do something more exotic like using a third-party crate or storing [u8; 6]
, but I'm not sure that's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the u40
/u48
idea. I'll try and whip that up. You would probably know the answer to this better than me, is 255 enough for all the introspection sources we ever might have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if we see people start to get close to u40
/u48
, we could either:
- Go back to a
u64
and increase the size ofGlobalId
. - Convert
GlobalId
to au128
and manually pack it ourselves. That's more than enough space to store all the information we need, even with aCluster: u64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is 255 enough for all the introspection sources we ever might have?
Currently we have ~30, and every source has some maintenance and implementation overhead, so at least with the current compute logging system I doubt that we'd ever get more than 255 sources. If I'm wrong we still have the option to steal a couple bit from the 8 bits we reserve for the GlobalId variant, so that wouldn't be the end of the world.
Also if we see people start to get close to u40/u48, we could either:
Agreed! The challenge might be noticing that people get close :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! The challenge might be noticing that people get close :)
I'll add an error log when someone fills up 47 bits so we are alerted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I just increased this to 48 bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look closely at the adapter parts, but this lgtm. I have a preference for having the string representation of these IDs start with an "s", so that you can still use a NOT LIKE 's%'
query to exclude system objects, but there might be reasons why that wouldn't work.
src/repr/src/global_id.rs
Outdated
@@ -92,6 +99,7 @@ impl FromStr for GlobalId { | |||
let val: u64 = s[1..].parse()?; | |||
match s.chars().next().unwrap() { | |||
's' => Ok(GlobalId::System(val)), | |||
'i' => Ok(GlobalId::IntrospectionSourceIndex(val)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this prefix si
, to make it clear that this is still a system ID? Perhaps a question to bring before the SQL council, if it wasn't already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I brought it up with the council: https://materializeinc.slack.com/archives/C063H5S7NKE/p1732719347690519
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a commit to switch this to si
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Will follow up in SQL council about the string repr of the ids
/// to. | ||
/// Cluster ID Inner Value: A per variant unique number indicating the cluster the index | ||
/// belongs to. | ||
/// Log Variant: A unique number indicating the log variant this index is on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has probably been discussed at length already so no need to change, but it's a bit surprising to me that we're only leaving 8 bits for the log variant especially when we already have 31 variants. This feels like something we could realistically run up against in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily at length, but we have discussed it: #30603 (comment).
In general if we need to increase the size of any of these fields in the future, then we can do a similar migration where we change the representation of GlobalId and re-assign every ID to introspection source indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we wouldn't even need a migration, just changing the packing code to spill the log variant into free bits of the cluster ID variant should be enough. But yeah the important thing is that the number of log variants is under our control, in contrast to the number of clusters, so I'm much less worried about the former.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, both comments make sense to me and alleviate all concerns, thanks!
@@ -19,6 +19,7 @@ message GlobalId { | |||
uint64 user = 2; | |||
uint64 transient = 3; | |||
google.protobuf.Empty explain = 4; | |||
uint64 introspection_source_index = 5; | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have one in the catalog
protobuf objects so types we write for the catalog are isolated from the rest of the system, then another in repr
for the rest of the system. No idea why we have this one in storage-types
though, it can probably be deleted?
const WARN_MASK: u64 = 1 << 47; | ||
if MASK & id == 0 { | ||
if WARN_MASK & id != 0 { | ||
error!("{WARN_MASK} or more `StorageInstanceId`s allocated, we will run out soon"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make this a soft assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a soft assert gives us anything additional. There's nothing incorrect about this being true and it's not something we want to avoid in tests. We just want to be warned if it happens in prod so we can start coming up with a plan to increase the width of cluster IDs.
@ParkMyCar will you take another look at the commits since you're last review? I haven't force pushed so you should be able to review those in isolation if you want. |
@jkosh44 looking now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except for the comment on parsing
let variant = match tag { | ||
's' => { | ||
if Some('i') == s.chars().next() { | ||
s = &s[1..]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be:
s = &s[1..]; | |
s = &s[2..]; |
Since the first two bytes are si
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this is correct as is. We've already consumed the first byte above on line 75 s = &s[1..];
. We just need to consume an extra byte for this variant because it has an extra letter compared to the rest.
_ => return Err(anyhow!("couldn't parse id {}", s)), | ||
}; | ||
let val: u64 = s.parse()?; | ||
Ok(variant(val)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woah, I had no idea you could construct enums like this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding that all enum variants are actually just functions under the hood. So the type of variant
is actually fn(u64) -> CatalogItemId
. Then variant(val)
is a normal function call that returns a CatalogItemId
.
let variant = match tag { | ||
's' => { | ||
if Some('i') == s.chars().next() { | ||
s = &s[1..]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
This commit adds a new variant of `GlobalId` and `CatalogItemId` for introspection source indexes. The values of these IDs are deterministically derived from the cluster ID and the log variant. Introspection source indexes are a special edge case of items. They are considered system items, but they are the only system item that can be created by the user at any time. All other system items can only be created by the system during the startup of an upgrade. Previously, it was possible to allocate the same System ID to two different objects if something like the following happened: 1. Materialize version `v` is running in read-write mode. 2. Materialize version `v + 1` starts in read-only mode. 3. The next system item ID is `s`. 4. `v + 1` allocates `s` for a new system item (table, view, introspection source, etc.) 5. `v` creates a new user cluster and allocates `s` through `s + n` to the introspection source indexes in that cluster. At this point we have two separate objects with the same Global ID, which is bad. 6. `v + 1` reboots in read-write mode and allocates `s + n + 1` to the new system item. At this point the new system item has received two different IDs, which is also bad. Putting introspection source index IDs in their own namespace and making them deterministic removes this issue and ones like it. Fixes #MaterializeInc/database-issues/issues/8731
This commit adds a new variant of
GlobalId
andCatalogItemId
forintrospection source indexes. The values of these IDs are
deterministically derived from the cluster ID and the log variant.
Introspection source indexes are a special edge case of items. They are
considered system items, but they are the only system item that can be
created by the user at any time. All other system items can only be
created by the system during the startup of an upgrade.
Previously, it was possible to allocate the same System ID to two
different objects if something like the following happened:
v
is running in read-write mode.v + 1
starts in read-only mode.s
.v + 1
allocatess
for a new system item (table, view,introspection source, etc.)
v
creates a new user cluster and allocatess
throughs + n
tothe introspection source indexes in that cluster. At this point we
have two separate objects with the same Global ID, which is bad.
v + 1
reboots in read-write mode and allocatess + n + 1
to thenew system item. At this point the new system item has received two
different IDs, which is also bad.
Putting introspection source index IDs in their own namespace and
making them deterministic removes this issue and ones like it.
Fixes #MaterializeInc/database-issues/issues/8731
Motivation
This PR fixes a recognized bug.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.